AITopics | latent semantic analysis

Collaborating Authors

latent semantic analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques

Nanyonga, Aziida, Wasswa, Hassan, Turhan, Ugur, Joiner, Keith, Wild, Graham

arXiv.org Artificial IntelligenceJan-14-2025

Aviation safety is a global concern, requiring detailed investigations into incidents to understand contributing factors comprehensively. This study uses the National Transportation Safety Board (NTSB) dataset. It applies advanced natural language processing (NLP) techniques, including Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (pLSA), and K-means clustering. The main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics. This research contributes to aviation safety by providing insights into incident narratives and demonstrating the versatility of NLP and topic modelling techniques in extracting valuable information from complex datasets. The results, including topics identified from various techniques, provide an understanding of recurring themes. Comparative analysis reveals that LDA performed best with a coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437. K-means clustering further reveals commonalities and unique insights into incident narratives. In conclusion, this study uncovers latent patterns and thematic structures within incident narratives, offering a comparative analysis of multiple-topic modelling techniques. Future research avenues include exploring temporal patterns, incorporating additional datasets, and developing predictive models for early identification of safety issues. This research lays the groundwork for enhancing the understanding and improvement of aviation safety by utilising the wealth of information embedded in incident narratives.

dataset, incident narrative, narrative, (13 more...)

arXiv.org Artificial Intelligence

2501.07924

Country:

Oceania > Australia > New South Wales (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.05)
North America > United States > Pennsylvania (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing

Nanyonga, Aziida, Wasswa, Hassan, Turhan, Ugur, Joiner, Keith, Wild, Graham

arXiv.org Artificial IntelligenceJan-2-2025

Improvements in aviation safety analysis call for innovative techniques to extract valuable insights from the abundance of textual data available in accident reports. This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF), to dissect aviation incident narratives using the Australian Transport Safety Bureau (ATSB) dataset. The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights. Through a comparative analysis, this research not only showcases the potential of these methods in aviation safety but also elucidates their distinct advantages and limitations.

aviation incident narrative, narrative, textual data, (12 more...)

arXiv.org Artificial Intelligence

2501.01227

Country:

North America > United States (0.28)
Oceania > Australia > New South Wales (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.05)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.49)

Add feedback

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

Shakil, Hassan, Mahi, Atqiya Munawara, Nguyen, Phuoc, Ortiz, Zeydy, Mardini, Mamoun T.

arXiv.org Artificial IntelligenceMay-7-2024

In the contemporary era characterized by a deluge of data, the intelligence community faces the challenge of information overload, needing to process vast amounts of information swiftly and effectively. The ability to generate succinct, clear, and actionable summaries from diverse data sources is crucial, as it often determines the success of strategic objectives in this information-rich environment. As the demand for systems capable of automating large-scale text summarization without compromising on quality or relevance intensifies, the role of such technologies becomes increasingly critical Liu and Lapata [2019]. Text summarization, a pivotal task within Natural Language Processing (NLP), has found widespread application across various domains, including news aggregation and the distillation of extensive documents into manageable summaries. The exponential growth in data underscores the utility of text summarization in enhancing content accessibility and comprehension, thus facilitating more efficient navigation through information landscapes Chouikhi and Alsuhaibani [2022].

evaluation, metric, summarization, (16 more...)

arXiv.org Artificial Intelligence

2405.04053

Country:

North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
North America > United States > Kansas (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Government (0.49)
Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.43)

Add feedback

The evolving of Data Science and the Saudi Arabia case. How much have we changed in 13 years?

Barahona, Igor

arXiv.org Machine LearningOct-23-2023

A comprehensive examination of data science vocabulary usage over the past 13 years in this work is conducted. The investigation commences with a dataset comprising 16,018 abstracts that feature the term "data science" in either the title, abstract, or keywords. The study involves the identification of documents that introduce novel vocabulary and subsequently explores how this vocabulary has been incorporated into scientific literature. To achieve these objectives, I employ techniques such as Exploratory Data Analysis, Latent Semantic Analysis, Latent Dirichlet Analysis, and N-grams Analysis. A comparison of scientific publications between overall results and those specific to Saudi Arabia is presented. Based on how the vocabulary is utilized, representative articles are identified.

data mining, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2310.14808

Country:

Asia > Middle East > Saudi Arabia (1.00)
Europe > Austria > Vienna (0.14)
North America > United States > Rhode Island (0.04)
(5 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Health & Medicine (1.00)
Government > Regional Government (1.00)
Energy (1.00)
Banking & Finance (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Automated Code Extraction from Discussion Board Text Dataset

Saravani, Sina Mahdipour, Ghaffari, Sadaf, Luther, Yanye, Folkestad, James, Moraes, Marcia

arXiv.org Artificial IntelligenceApr-18-2023

This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.17495

Country:

North America > United States > Colorado > Larimer County > Fort Collins (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Communications > Social Media (0.93)

Add feedback

Unsupervised Broadcast News Summarization; a comparative study on Maximal Marginal Relevance (MMR) and Latent Semantic Analysis (LSA)

Ramezani, Majid, Shahryari, Mohammad-Salar, Feizi-Derakhshi, Amir-Reza, Feizi-Derakhshi, Mohammad-Reza

arXiv.org Artificial IntelligenceJan-5-2023

The methods of automatic speech summarization are classified into two groups: supervised and unsupervised methods. Supervised methods are based on a set of features, while unsupervised methods perform summarization based on a set of rules. Latent Semantic Analysis (LSA) and Maximal Marginal Relevance (MMR) are considered the most important and well-known unsupervised methods in automatic speech summarization. This study set out to investigate the performance of two aforementioned unsupervised methods in transcriptions of Persian broadcast news summarization. The results show that in generic summarization, LSA outperforms MMR, and in query-based summarization, MMR outperforms LSA in broadcast news summarization.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CSICC58665.2023.10105403

2301.02284

Country:

Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Media > News (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Uncovering Hidden Meaning: A Beginner's Guide to Latent Semantic Analysis

#artificialintelligenceDec-31-2022, 09:50:18 GMT

If you have ever worked with text data, you have likely encountered the challenge of dealing with high-dimensional and sparse data. One popular solution to this problem is latent semantic analysis (LSA), also known as latent semantic indexing (LSI). LSA is a technique for extracting latent (hidden) semantics from a collection of documents or text data. It does this by mapping the documents into a lower-dimensional space, where the relationships between the documents and the underlying concepts they represent can be more easily understood. One of the key benefits of LSA is that it can handle large amounts of data efficiently and is robust to noise and sparse data.

latent semantic analysis, lower-dimensional space, natural language processing, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Qi, Qianqian, Hessen, David J., Deoskar, Tejaswini, van der Heijden, Peter G. M.

arXiv.org Artificial IntelligenceNov-25-2022

Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition (SVD) for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices. We show that CA has some attractive properties as compared to LSA, for instance that effects of margins, i.e. sums of row elements and column elements, arising from differing document-lengths and term-frequencies are effectively eliminated, so that the CA solution is optimally suited to focus on relationships among documents and terms. A unifying framework is proposed that includes both CA and LSA as special cases. We empirically compare CA to various LSA based methods on text categorization in English and authorship attribution on historical Dutch texts, and find that CA performs significantly better. We also apply CA to a long-standing question regarding the authorship of the Dutch national anthem Wilhelmus and provide further support that it can be attributed to the author Datheen, amongst several contenders.

artificial intelligence, natural language, semantic analysis and correspondence analysis, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S1351324923000244

2108.06197

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

How to use Latent Semantic Analysis to classify documents

#artificialintelligenceOct-19-2022, 19:36:06 GMT

The children were sitting in circle on the floor. "The flat hat has a number and a label that says parrots and battercakes" -- one of the kids screamed Every single child starts laughing. "Nooooo, it was the black cat is under the table and it eats carrots and pancakes" -- another child replied I realized only then that they were playing telephone (or broken telephone as we call it in Argentina). Human communication is complex, mainly because each person expresses themselves differently. We could speak the same language but use different slang, words, or expressions to convey the same message.

document classification, latent semantic analysis, matrix, (14 more...)

#artificialintelligence

Country: South America > Argentina (0.25)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Data Science: Natural Language Processing (NLP) in Python

#artificialintelligenceNov-21-2021, 22:39:04 GMT

Created by Lazy Programmer Inc. In this course you will build MULTIPLE practical systems using natural language processing, or NLP - the branch of machine learning and data science that deals with text and speech. This course is not part of my deep learning series, so it doesn't contain any hard math - just straight up coding in Python. All the materials for this course are FREE. After a brief discussion about what NLP is and what it can do, we will begin building very useful stuff.

machine learning, python, sentiment analysis, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > Online (0.78)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback